Constraint selection for topic-based MDI adaptation of language models
نویسندگان
چکیده
This paper presents an unsupervised topic-based language model adaptation method which specializes the standard minimum information discrimination approach by identifying and combining topic-specific features. By acquiring a topic terminology from a thematically coherent corpus, language model adaptation is restrained to the sole probability re-estimation of n-grams ending with some topic-specific words, keeping other probabilities untouched. Experiments are carried out on a large set of spoken documents about various topics. Results show significant perplexity and recognition improvements which outperform results of classical adaptation techniques.
منابع مشابه
Selection-Based Language Model for Domain Adaptation using Topic Modeling
This paper introduces a selection-based LM using topic modeling for the purpose of domain adaptation which is often required in Statistical Machine Translation. The performance of this selection-based LM slightly outperforms the state-of-theart Moore-Lewis LM by 1.0% for EN-ES and 0.7% for ES-EN in terms of BLEU. The performance gain in terms of perplexity was 8% over the Moore-Lewis LM and 17%...
متن کاملTopic Adaptation for Lecture Translation through Bilingual Latent Semantic Models
This work presents a simplified approach to bilingual topic modeling for language model adaptation by combining text in the source and target language into very short documents and performing Probabilistic Latent Semantic Analysis (PLSA) during model training. During inference, documents containing only the source language can be used to infer a full topic-word distribution on all words in the ...
متن کاملAn LDA-based Topic Selection Approach to Language Model Adaptation for Handwritten Text Recognition
Typically, only a very limited amount of in-domain data is available for training the language model component of an Handwritten Text Recognition (HTR) system for historical data. One has to rely on a combination of in-domain and out-ofdomain data to develop language models. Accordingly, domain adaptation is a central issue in language modeling for HTR. We pursue a topic modeling approach to ha...
متن کاملEfficient language model adaptation through MDI estimation
This paper presents a method for n-gram language model adaptation based on the principle of minimum discrimination information. A background language model is adapted to t constraints on its marginal distributions that are derived from new observed data. This work gives a di erent derivation of the model by Kneser et al. (1997) and extends its application to interpolated language models. The pr...
متن کاملMDI adaptation for the lazy: avoiding normalization in LM adaptation for lecture translation
This paper provides a fast alternative to Minimum Discrimination Information-based language model adaptation for statistical machine translation. We provide an alternative to computing a normalization term that requires computing full model probabilities (including back-off probabilities) for all n-grams. Rather than re-estimating an entire language model, our Lazy MDI approach leverages a smoo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009